Dynamic, Graph-Based Risk Assessments for the Detection of Violent Extremist Radicalization Trajectories Using Large Scale Social and Behavioral Data, United States, Canada, United Kingdom, Germany, 1994-2020 (ICPSR 38135)
Version Date: Jan 13, 2022 View help for published
Principal Investigator(s): View help for Principal Investigator(s)
Anura P. Jayasumana, Colorado State University;
Jytte Klausen, Brandeis University
https://doi.org/10.3886/ICPSR38135.v1
Version V1
Summary View help for Summary
This project examines the trajectory of radicalization of jihadists and Incels with two broad objectives in mind. First, to develop new integrated computational technology that can mine, monitor, and screen for the occurrence of behaviors associated with dangerously escalating extremism in large heterogenous databases and provide early warnings of individuals or groups on behavioral trajectories toward extremist violence. Second, to harness data science methodologies to enable rapid, semi-automated support for law enforcement analysts and social science researchers to produce structured behavioral indicator profiles from text sources.
The study operated from the premise that being that violent extremists are a rare, complex phenomenon, it is futile to search for a profile of extremism. Rather, it is better to focus on explaining how people come to embrace violent extremism. This path, referred to here as a radicalization trajectory, implies that an arc exists leading the perpetrator from entertaining extremist ideas to action, and that there is a somewhat predictable pathway from a normal, if perhaps angry state, to the perpetration of a violent attack in the name of the ideology. Two teams were combined to analyze radicalization trajectories: data collection and analysis led by Brandeis University and technology development led by Colorado State University (CSU).
The questions revolving around the technological development were as follows: Can tools that rigorously examine and account for the activities of close associates better predict the likelihood that an individual would engage in violent extremism? Which risk assessment indicators for violent extremism in the extant literature are detectable via automated or semi-automated technologies, and what databases and datasets must be integrated to facilitate this detection? Can computationally efficient tools be used to mine these databases for the specific purposes of monitoring and screening for individuals and small groups posing a significant risk for violence?
Users should refer to the data collection notes field below for additional information about study citation.
Citation View help for Citation
Export Citation:
Funding View help for Funding
Subject Terms View help for Subject Terms
Geographic Coverage View help for Geographic Coverage
Smallest Geographic Unit View help for Smallest Geographic Unit
Country
Restrictions View help for Restrictions
Access to these data is restricted. Users interested in obtaining these data must complete a Restricted Data Use Agreement, specify the reasons for the request, and obtain IRB approval or notice of exemption for their research.
Distributor(s) View help for Distributor(s)
Time Period(s) View help for Time Period(s)
Date of Collection View help for Date of Collection
Data Collection Notes View help for Data Collection Notes
-
At PI request, dataset 1 should be attributed to Anura P. Jayasumana while datasets 2-6 should be attributed to Jytte Klausen. Please refer to the PI user guide for additional information.
Study Purpose View help for Study Purpose
The purpose of this study was to analyze the trajectory of individuals who undergo extremist radicalization, with an emphasis on jihadists and involuntary celibates (Incels). The goal of the research was to produce new methods to mine for instances of behaviors associated with dangerous extremism in large databases and provide warnings of individuals or groups whose behavioral trajectory is trending towards extremist violence. Furthermore, researchers sought to see if data science technology could be used to develop rapid, semi-automated support for law enforcement and social scientists to produce behavioral profiles from text resources. Variables included the initial year of radicalization, radical behaviors and cues, text descriptions of radical events, along with criminal and mental health backgrounds. Demographics collected include sex, ethnicity, and country of residence.
The questions revolving around the technological development were as follows: Can tools that rigorously examine and account for the activities of close associates better predict the likelihood that an individual would engage in violent extremism? Which risk assessment indicators for violent extremism in the extant literature are detectable via automated or semi-automated technologies, and what databases and datasets must be integrated to facilitate this detection? Can computationally efficient tools be used to mine these databases for the specific purposes of monitoring and screening for individuals and small groups posing a significant risk for violence?
Study Design View help for Study Design
For individuals to be included for analysis, they must meet the criteria as a jihadist or Incel as defined by the researchers.
For jihadists these three standards must be meet:
For Incels the follow two conditions must be meet:
The data collection and analysis led by Brandeis University centered around three data types: (1) demographic data about known jihadist terrorist offenders from the United States and the United Kingdom and a small case control dataset comprised of Incels; (2)information about observed behavioral changes and the timing of these behaviors, and (3) a large collection of text data indicative of cues to such behaviors. Twenty-four different behavioral indicators were used to infer when an individual embraced violent extremism, along with timestamps to chart one's radicalization process. Demographic information and social networks were also coded for to analyze distinctive pathways taken by different subgroups. Text data included a variety of publicly available sources, including news articles, and court documents. The Brandeis team used a standard relational database management system and query language (also known as SQL) to support the coordination of structured (numeric) and unstructured (text) data.
The CSU team developed processes to assist or automate data collection and analysis with software modules created in Python. Natural Language Processing techniques were used to recognize named entities and pronouns in text, match them based on specific linguistic rules, and classified them under specific indicator classes created by the researchers. Examples include Seeking Religious Authority, Desire for Action, and Issue a Threat. Graph database tools were also used to formulate and visualize radicalization trajectories. Investigative Search for Graph Trajectories (also known as INSiGHT) helped to identify individuals or small groups who follow radicalization patterns, and generate a chronological radicalization trajectory.
Time Method View help for Time Method
Universe View help for Universe
Jihadists from the United States and Canada, along with Incels from Germany, Canada, the United States, and United Kingdom.
Unit(s) of Observation View help for Unit(s) of Observation
Data Type(s) View help for Data Type(s)
Description of Variables View help for Description of Variables
The Synthetic Dataset contains 20 variables related to dates of behavioral indicators of radicalization trajectories.
The Incel Demographic Dataset variables capture the country, sex, year of birth, year of first radicalization action, nationality, mental illness, and criminality of subjects.
The Incel Trajectory Dataset tracks events that make up an individual Incel's radicalization trajectory. Variables include behaviors and cues associated with radicalization behaviors, and a brief description of the event in question.
The Jihadist Demographic Dataset includes the country of residence, sex, year of birth, and ethno-nationality of jihadist individuals. Variables also inquire about whether or not subjects converted to Islam, their first year of radicalization action, mental illness, criminality, whether or their terrorist action was violent, and if they are foreign fighters or not.
The Jihadist Text Dataset contains excerpts of text related to specific indicator events copied from public sources. Researchers also extracted keywords from the source text which are most relevant to the specific indicator in question.
The Jihadist Trajectory Dataset tracks events that make up an individual Jihadists radicalization trajectory. Variables include behaviors and cues associated with radicalization behaviors, and a brief description of the event in question
Response Rates View help for Response Rates
Not applicable
Presence of Common Scales View help for Presence of Common Scales
none
HideNotes
The public-use data files in this collection are available for access by the general public. Access does not require affiliation with an ICPSR member institution.
ICPSR usually offers files in multiple formats for researchers to be able to access data and documentation in formats that work well within their needs. If you have questions about the accessibility of materials distributed by ICPSR or require further assistance, please visit ICPSR’s Accessibility Center.
One or more files in this data collection have special restrictions. Restricted data files are not available for direct download from the website; click on the Restricted Data button to learn more.

This dataset is maintained and distributed by the National Archive of Criminal Justice Data (NACJD), the criminal justice archive within ICPSR. NACJD is primarily sponsored by three agencies within the U.S. Department of Justice: the Bureau of Justice Statistics, the National Institute of Justice, and the Office of Juvenile Justice and Delinquency Prevention.